Skip to content

NPE caused by undefined object#177

Open
Trisia wants to merge 2 commits into
apache:trunkfrom
Trisia:trunk
Open

NPE caused by undefined object#177
Trisia wants to merge 2 commits into
apache:trunkfrom
Trisia:trunk

Conversation

@Trisia

@Trisia Trisia commented Feb 1, 2024

Copy link
Copy Markdown

example:

125 0 obj
<</Tabs/S/Group<</S/Transparency/Type/Group/CS/DeviceRGB>>/Contents[69 0 R 3646 0 R 70 0 R]/Type/Page/QITE_pageid<</UF 3619 0 R/P 5/D(AA\r)/F 3620 0 R/I 3621 0 R>>/Resources<</ExtGState<</Xi10 1 0 R/GS7 3640 0 R/GS8 3641 0 R>>/ProcSet[/PDF/Text/ImageB/ImageC/ImageI]/Font<</F7 3633 0 R/F8 3624 0 R/F9 3642 0 R/F1 3627 0 R/F2 3628 0 R/F3 3629 0 R/Xi11 2 0 R>>>>/Parent 82 0 R/StructParents 5/MediaBox[0 0 595.2 841.92]>>

see

QITE_pageid<</UF 3619 0 R/P 5/D(AA\r)/F 3620 0 R/I 3621 0 R>>

object 3621 0 R is not defined

When I use PDDocument.saveIncremental to save the document it causes the following error:

java.lang.NullPointerException
	at java.util.Hashtable.computeIfAbsent(Hashtable.java:1004)
	at org.apache.pdfbox.pdfwriter.COSWriter.getObjectKey(COSWriter.java:1089)
	at org.apache.pdfbox.pdfwriter.COSWriter.writeReference(COSWriter.java:1367)
	at org.apache.pdfbox.pdfwriter.COSWriter.visitFromDictionary(COSWriter.java:1207)
	at org.apache.pdfbox.pdfwriter.COSWriter.writeDictionary(COSWriter.java:1155)
	at org.apache.pdfbox.pdfwriter.COSWriter.visitFromDictionary(COSWriter.java:1202)
	at org.apache.pdfbox.cos.COSDictionary.accept(COSDictionary.java:1265)
	at org.apache.pdfbox.pdfwriter.COSWriter.doWriteObject(COSWriter.java:610)
	at org.apache.pdfbox.pdfwriter.COSWriter.doWriteObject(COSWriter.java:643)
	at org.apache.pdfbox.pdfwriter.COSWriter.doWriteObjects(COSWriter.java:540)
	at org.apache.pdfbox.pdfwriter.COSWriter.doWriteBody(COSWriter.java:450)
	at org.apache.pdfbox.pdfwriter.COSWriter.visitFromDocument(COSWriter.java:1299)
	at org.apache.pdfbox.cos.COSDocument.accept(COSDocument.java:413)
	at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1568)
	at org.apache.pdfbox.pdmodel.PDDocument.saveIncremental(PDDocument.java:1078)

so we can skip those not define object, make it work.

@THausherr

Copy link
Copy Markdown
Contributor

Does this problem also occur with 2.0? Does it also occur in ordinary saving? Can you share the file?

@Trisia

Trisia commented Feb 2, 2024

Copy link
Copy Markdown
Author

The issue occurred with PDFBox 3.0.0, and I haven't tested it with 2.X.X. In fact, the PDF is encrypted and doesn't allow editing. I'm sorry, but for certain reasons, I can't provide the PDF file.

@THausherr

Copy link
Copy Markdown
Contributor

I had a look at 2.0, one of the changes isn't needed (2.0 doesn't use computeIfAbsent because it doesn't exist in the jdk, and also avoids using a null, so maybe this was introduced in refactoring), the other ones are. Your changes look useful so I'll commit them next week to give time for other opinions (COSWriter is a difficult class). I'll also add some logging.

@lehmi

lehmi commented Feb 3, 2024

Copy link
Copy Markdown
Contributor

We have to ensure that the resulting pdf isn't (more) corrupt that the origin one if indirect object references are omitted. To remove a reference from a COSArray shouldn't be a big problem (the object reference is simply missing) but to remove a reference from a COSDictionary without removing the key will lead to a corrupt pdf In such cases the key should be removed as well or the corrupt reference should be replaced by a COSNull object

@THausherr

Copy link
Copy Markdown
Contributor

In that case I'd really like to test this with a file. I forgot to mention yesterday that I tried to modify a PDF by "blanking" an object and then load and call saveIncremental() but no problem occured.

@Trisia

Trisia commented Feb 4, 2024

Copy link
Copy Markdown
Author

I thought the issue was caused by document encryption, but after a simple test, I discovered it was not the case. The document seems to have been generated using iText 5.5.8.

0003719771 00000 n 
0003759924 00000 n 
trailer
<</Info 5 0 R/Encrypt 7894 0 R/ID [<e3d8be7479575ba4b5622d7b671d2255><64acf33dec0b965797b470cbf8346962>]/Root 75 0 R/Size 7895>>
%iText-5.5.8
startxref
3760055
%%EOF

perhaps the original document had already lost the obj references.

maybe remove lost obj reference from a COSArray is good idea.

@THausherr

Copy link
Copy Markdown
Contributor

https://issues.apache.org/jira/browse/PDFBOX-5717 is the same problem

@THausherr

Copy link
Copy Markdown
Contributor

Please test whether your code works with the recent changes by @lehmi with the latest snapshot build:
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/3.0.2-SNAPSHOT/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants